Goto

Collaborating Authors

 uncalibrated camera


Multiview Human Body Reconstruction from Uncalibrated Cameras

Neural Information Processing Systems

We present a new method to reconstruct 3D human body pose and shape by fusing visual features from multiview images captured by uncalibrated cameras. Existing multiview approaches often use spatial camera calibration (intrinsic and extrinsic parameters) to geometrically align and fuse visual features. Despite remarkable performances, the requirement of camera calibration restricted their applicability to real-world scenarios, e.g., reconstruction from social videos with wide-baseline cameras. We address this challenge by leveraging the commonly observed human body as a semantic calibration target, which eliminates the requirement of camera calibration. Specifically, we map per-pixel image features to a canonical body surface coordinate system agnostic to views and poses using dense keypoints (correspondences). This feature mapping allows us to semantically, instead of geometrically, align and fuse visual features from multiview images. We learn a self-attention mechanism to reason about the confidence of visual features across and within views. With fused visual features, a regressor is learned to predict the parameters of a body model. We demonstrate that our calibration-free multiview fusion method reliably reconstructs 3D body pose and shape, outperforming state-of-the-art single view methods with post-hoc multiview fusion, particularly in the presence of non-trivial occlusion, and showing comparable accuracy to multiview methods that require calibration.


Supplementary Material Multiview Human Body Reconstruction from Uncalibrated Cameras

Neural Information Processing Systems

We randomly select samples from them with a ratio of 0.8: 0.2 Each method reconstruct a unified mesh from multiview inputs. It comes with ground truth 3D pose and follow its standard split for training and testing. MannequinChallenge is captured by a moving hand-held camera while subjects stay still in daily life scenarios. We use annotations provided by Leroy et al. [ There are 45.3M learnable parameters in our architecture, among them 12.6M are used for the We test run time for a different number of views within a frame. The graph convolutional mesh decoder has 7 layers, each with 256 channels.


Multiview Human Body Reconstruction from Uncalibrated Cameras

Neural Information Processing Systems

We present a new method to reconstruct 3D human body pose and shape by fusing visual features from multiview images captured by uncalibrated cameras. Existing multiview approaches often use spatial camera calibration (intrinsic and extrinsic parameters) to geometrically align and fuse visual features. Despite remarkable performances, the requirement of camera calibration restricted their applicability to real-world scenarios, e.g., reconstruction from social videos with wide-baseline cameras. We address this challenge by leveraging the commonly observed human body as a semantic calibration target, which eliminates the requirement of camera calibration. Specifically, we map per-pixel image features to a canonical body surface coordinate system agnostic to views and poses using dense keypoints (correspondences). This feature mapping allows us to semantically, instead of geometrically, align and fuse visual features from multiview images.


A Complementary Framework for Human-Robot Collaboration with a Mixed AR-Haptic Interface

Yan, Xiangjie, Jiang, Yongpeng, Chen, Chen, Gong, Leiliang, Ge, Ming, Zhang, Tao, Li, Xiang

arXiv.org Artificial Intelligence

There is invariably a trade-off between safety and efficiency for collaborative robots (cobots) in human-robot collaborations. Robots that interact minimally with humans can work with high speed and accuracy but cannot adapt to new tasks or respond to unforeseen changes, whereas robots that work closely with humans can but only by becoming passive to humans, meaning that their main tasks suspended and efficiency compromised. Accordingly, this paper proposes a new complementary framework for human-robot collaboration that balances the safety of humans and the efficiency of robots. In this framework, the robot carries out given tasks using a vision-based adaptive controller, and the human expert collaborates with the robot in the null space. Such a decoupling drives the robot to deal with existing issues in task space (e.g., uncalibrated camera, limited field of view) and in null space (e.g., joint limits) by itself while allowing the expert to adjust the configuration of the robot body to respond to unforeseen changes (e.g., sudden invasion, change of environment) without affecting the robot's main task. Additionally, the robot can simultaneously learn the expert's demonstration in task space and null space beforehand with dynamic movement primitives (DMP). Therefore, an expert's knowledge and a robot's capability are both explored and complementary. Human demonstration and involvement are enabled via a mixed interaction interface, i.e., augmented reality (AR) and haptic devices. The stability of the closed-loop system is rigorously proved with Lyapunov methods. Experimental results in various scenarios are presented to illustrate the performance of the proposed method.